83 research outputs found

    A Novel Set of Directives for Multi-device Programming with OpenMP

    Get PDF
    This work was supported by MEEP project, which has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 946002. The JU receives support from theEuropean Union’s Horizon 2020 research and innovation programme and Spain, Croatia, Turkey.Peer ReviewedPostprint (author's final draft

    Extending OmpSs for OpenCL kernel co-execution in heterogeneous systems

    Get PDF
    © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Heterogeneous systems have a very high potential performance but present difficulties in their programming. OmpSs is a well known framework for task based parallel applications, which is an interesting tool to simplify the programming of these systems. However, it does not support the co-execution of a single OpenCL kernel instance on several compute devices. To overcome this limitation, this paper presents an extension of the OmpSs framework that solves two main objectives: the automatic division of datasets among several devices and the management of their memory address spaces. To adapt to different kinds of applications, the data division can be performed by the novel HGuided load balancing algorithm or by the well known Static and Dynamic. All this is accomplished with negligible impact on the programming. Experimental results reveal that there is always one load balancing algorithm that improves the performance and energy consumption of the system.This work has been supported by the University of Cantabria with grant CVE-2014-18166, the Generalitat de Catalunya under grant 2014-SGR-1051, the Spanish Ministry of Economy, Industry and Competitiveness under contracts TIN2016- 76635-C2-2-R (AEI/FEDER, UE) and TIN2015-65316-P. The Spanish Government through the Programa Severo Ochoa (SEV-2015-0493). The European Research Council under grant agreement No 321253 European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc Projects, grant agreement n 288777, 610402 and 671697 and the European HiPEAC Network.Peer ReviewedPostprint (published version

    Molecular data supports the inclusion of Ildobates neboti Español in Zuphiini (Coleoptera: Carabidae: Harpalinae)

    Get PDF
    The phylogenetic relationships of Ildobates neboti Español (Coleoptera: Carabidae: Harpalinae) were investigated based on three nuclear genes (full 18S rRNA, and a fragment of each 28S rRNAand wingless).We compiled a data set using published sequences of 32 members of Harpalinae including one example each of Dryptini (genus Desera), Galeritini (Galerita) and Zuphiini (Thalpius), plus three Brachininae as outgroups. These three tribes form the “Dryptitae”, within which various relationships of Ildobates had been proposed. The analyses of the datamatrix using parsimony (with equally weighted and reweighted characters) and Bayesian posterior probabilities all support the monophyly of the three tribes in “Dryptitae”, as well as a closest relationship of Ildobates with Thalpius to the exclusion of Desera plus Galerita. This confirms the previous inclusion of Ildobates among the Zuphiini, and corroborates current taxonomic classifications based on morphological criteria

    Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP

    Get PDF
    Traditional parallel applications have exploited regular parallelism, based on parallel loops. Only a few applications exploit sections parallelism. With the release of the new OpenMP specification (3.0), this programming model supports tasking. Parallel tasks allow the exploitation of irregular parallelism, but there is a lack of benchmarks exploiting tasks in OpenMP. With the current (and projected) multicore architectures that offer many more alternatives to execute parallel applications than traditional SMP machines, this kind of parallelism is increasingly important. And so, the need to have some set of benchmarks to evaluate it. In this paper, we motivate the need of having such a benchmarks suite, for irregular and/or recursive task parallelism. We present our proposal, the Barcelona OpenMP Tasks Suite (BOTS), with a set of applications exploiting regular and irregular parallelism, based on tasks. We present an overall evaluation of the BOTS benchmarks in an Altix system and we discuss some of the different experiments that can be done with the different compilation and runtime alternatives of the benchmarks.Peer ReviewedPostprint (published version

    The OmpSs reductions model and how to deal with scatter-updates

    Get PDF
    Scatter-updates represent a reoccurring algorithmic pattern in many scientific applications. Their scalable execution on modern systems is difficult due to performance limitations introduced by their irregular memory access pattern that prohibits an efficient use of the memory subsystem. Further performance degradation is caused by techniques that are required in order to eliminate potential data races and come at the cost of overhead. Taking a closer look at algorithmic properties, access patterns and common support techniques reveals that a one-size-fits-all solution does not exist and solutions are needed that can adapt to individual properties of the algorithm while maintaining programming transparency. In this work we propose a solution framework that supports a broad set of techniques, provides the required access pattern analytics to allow dynamic decision making and shows what language extensions are needed to maintain programming transparency. A reference implementation in OmpSs, a task-based parallel programming model, shows programmability and scalability of this solution

    Scaling irregular array-type reductions in OmpSs

    Get PDF
    Array-type reductions represent a frequently occurring algorithmic pattern in many scientific applications. A special case occurs if array elements are accessed in a non-linear, often random manner, which makes their concurrent and scalable execution difficult. In this work we present a new approach that consists of language- and runtime support to facilitate programming and delivers high scalability on modern shared-memory systems for such irregular array-type reductions. A reference implementation in OmpSs, a task-parallel programming model, shows promising results with speed-ups up to 15x on the Intel Xeon processor

    Scaling irregular array-type reductions in OmpSs

    Get PDF
    Array-type reductions represent a frequently occurring algorithmic pattern in many scientific applications. A special case occurs if array elements are accessed in a non-linear, often random manner, which makes their concurrent and scalable execution difficult. In this work we present a new approach that consists of language- and runtime support to facilitate programming and delivers high scalability on modern shared-memory systems for such irregular array-type reductions. A reference implementation in OmpSs, a task-parallel programming model, shows promising results with speed-ups up to 15x on the Intel Xeon processor

    HDOT — An approach towards productive programming of hybrid applications

    Get PDF
    bulk synchronous parallel (BSP) communication model can hinder performance increases. This is due to the complexity to handle load imbalances, to reduce serialisation imposed by blocking communication patterns, to overlap communication with computation and, finally, to deal with increasing memory overheads. The MPI specification provides advanced features such as non-blocking calls or shared memory to mitigate some of these factors. However, applying these features efficiently usually requires significant changes on the application structure. Task parallel programming models are being developed as a means of mitigating the abovementioned issues but without requiring extensive changes on the application code. In this work, we present a methodology to develop hybrid applications based on tasks called hierarchical domain over-decomposition with tasking (HDOT). This methodology overcomes most of the issues found on MPI-only and traditional hybrid MPI+OpenMP applications. However, by emphasising the reuse of data partition schemes from process-level and applying them to task-level, it enables a natural coexistence between MPI and shared-memory programming models. The proposed methodology shows promising results in terms of programmability and performance measured on a set of applications.This work has been developed with the support of the European Union H2020 program through the INTERTWinE project (agreement number 671602); the Severo Ochoa Program awarded by the Spanish Government (SEV-2015-0493); the Generalitat de Catalunya (contract 2017-SGR-1414); and the Spanish Ministry of Science and Innovation (TIN2015-65316-P, Computaci on de Altas Prestaciones VII). The authors gratefully acknowledge Dr. Arnaud Mura, CNRS researcher at Institut PPRIME in France, for the numerical tool CREAMS. Finally, the manuscript has greatly bene ted from the precise comments of the reviewers.Peer ReviewedPostprint (author's final draft
    • …
    corecore